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Q_- Abstract 

■^ , The outcome statistics of an informationally complete quantum measurement 

cd ■ for a system in a given state can be used to evaluate the ensemble expectation 

^' of any linear operator in the same state, by averaging a function of the 

outcomes that depends on the specific operator. Here we introduce two 

P^ ! novel data-processing strategies, non-linear in the frequencies, which lead to 

^ ' faster convergence to theoretical expectations. 

CX5 ' 
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o. 

OO , 1. Introduction 

O 

In Quantum Mechanics measuring a single observable provides only par- 
tial information about the state of the measured system. According to the 
;h ' Born interpretation, the quantum state is a rule for evaluating the outcome 

probabilities in all conceivable measurements, and a complete information 



X 



about the quantum state requires a thorough outcome statistics for a quo- 
rum of observables, or for a suitable informationally complete measurement 
(shortly info-complete) [l|, 1^ , in conjunction with a suitable data-processing, 
as it is done in quantum tomography (for a review see Ref. |3|). There 
are two main classes of approaches in quantum tomography: a) averaging 
"patterns functions" a method initiated in Ref. |j]; b) Maximum likelihood 
techniques J5| 
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Method a) has the advantage of providing any expectation value, e.g. a 
single density matrix element, without the need of estimating the entire den- 
sity operator. However, the estimated full matrix is not necessarily positive, 
which is not a serious drawback, since the non positivity falls within a small 
fluctuation for large numbers of data. 

Method b) has the advantage of providing a positive density operator, 
with smaller fluctuations, however, it has the more serious drawback of need- 
ing to estimate the full density matrix, while is exponentially large versus the 
number of systems, and, in the infinite dimensional case needs a dimension- 
ality cutoff which introduce a bias that is under control only if there is some 
prior knowledge of the state. 

In a recent paper J6| the optimal data-processing for evaluating ensemble 
averages from experimental outcomes was derived for a completely general 
setting within a Bayesian scheme that assumes a prior probability distri- 
bution of states. Using as optimality criterion the rate of estimated-to- 
theoretical convergence of averages, the optimal data-processing itself de- 
pends on the prior distribution of states. 

The purpose of the present paper is to exploit the dependence of the 
optimal data-processing on the prior distribution of states, in order to im- 
prove the convergence rate using an adaptive data-processing scheme. We 
will consider info-complete measurements — more generally than a quorum of 
observables — whose statistics allows to reconstruct all possible ensemble av- 
erages. Estimation of the quantum state itself is equivalent to the estimation 
of all possible ensemble averages. We will adopt the natural figure of merit 
used in Ref. J6|], which, in the present context, represents the estimated- 
to-theoretical convergence rate (in Hilbert-Schmidt distance) of the state. 
As we will see, exploiting the dependence of the optimal data-processing 
on the prior state leads to two different data processing strategies, which 
both improve the convergence rate compared to the standard tomographic 
procedures, and are easily implementable and computationally efficient: 

Method 1 (Bayesian iterative procedure): Bayesian update of the prior 
distribution after the first state reconstruction, with subsequent itera- 
tion of the optimization. 

Method 2 (Frequentist approach): replace the theoretical probability 
distribution of the info-complete in the optimal data-processing with 
the experimental frequencies. 



We will see that numerical simulations carried out with both methods show 
relevant improvement of convergence compared to the plain non adaptive 
processing of Ref. [6|]. 

The paper is organized as follows. In Section [2] we re-derive the optimal 
data-processing for given prior distribution of Ref. |6| within an improved 
theoretical framework. In Sections [3] and H] we introduce Methods 1 and 
2, respectively. Finally, in Section 0, we present numerical simulations for 
testing both methods in comparison with the original plain non adaptive 
data-processing, and in Section [6] we end the paper with concluding remarks. 

2. Optimization of the data processing 

In the modern formulation of Quantum Mechanics, the state of a quantum 
system associated to a rf-dimensional Hilbert space CK ~ C^ is represented by 
a density matrix, namely a positive operator p > with Tr[p] = 1. The Born 
formula provides the probabilities of outcomes in a quantum measurement 
in terms of the state p as follows 

p{z\p):=TT[pP,], (1) 

where the POVM (Pj) (Positive Operator Valued Measure) is a set of (gen- 
erally non orthogonal) positive operators Pj > resolving the identity as 
^i^i Pi = I, thus guaranteeing positivity and normalization of probabilities. 
The linear span of the POVM elements Pi, defined as S := Span{Pj}i^j^„, 
is a linear subspace of the space £j(CK) of linear operators on "K, and we 
will take as a canonical basis in £;(3-C) the operators |?Ti)(n|, where \n) is 
an orthonormal basis thus representing operators X by the vectors of their 
matrix elements Xjn,n = (Tra|x|n). A POVM is info-complete if S = ^{"K), 
namely all operators X G XL(J{) can be expanded on the POVM elements, 
and it is possible to determine all ensemble averages {X)p, as in Quantum 
Tomography. For each complex operator X G § the following decomposition 
holds 

N 

X = 5^/,[X]P„ (2) 

j=i 
where fi[X] is not unique if the set {Pj} is over-complete. 

With the above expressions we can write the ensemble average of X as 
follows: 

N 

{X),:=TT[Xp] = J2mW\p), (3) 



with the following statistical error 

N 



{^X')^:=J2\MX]\M^\P)-\{X),\' (4) 



1=1 



In a Bayesian scheme one has an a priori ensemble £ := {pj, Pi} of possible 
states Pi of the quantum system occurring with probability pi. We want 
to minimize the average statistical error on £ in the determination of the 
expectation value of X, namely the variance 



N 



{^X')^:=J2\MX]\M^\Ps)-\{X)\\, (5) 



i=l 



where p^ = ^iPiPi and |(X)P^ = X]iPi|Tr[pj^]P is the squared modulus 
of the expectation of X averaged over the states in the ensemble (since this 
term depends only on the ensemble it will be neglected from now on). Using 
Eq.([T]) the first term in Eq.([5]) can be rewritten as 



N 



^f{x)■.= J2\ux]\'np^Ps]. (6) 



i=l 



Given a POVM {Pi), it is possible to define a linear map A from an 
abstract N-dimensional space % of coefficient vectors c G 3C to ^^(M), with 
range §: 



N 



KC = Y,C^P^, (7) 



i=l 



SO that using the canonical basis in %, A has matrix elements Amn,i = {Pi)mn- 
A generalized inverse (shortly g-inverse) of A is any matrix F representing 
linear operators from L{'K) to % such that the following identity holds 

AFA = A (8) 

Notice that the matrix elements (Fj ^n) of F define a set of operators Di with 
matrix elements (-Di)mn := ^Imn- The role of g-inverse F is assessed by the 
two following important theorems 

Theorem 1. The following statements are equivalent 



1. r is a g-inverse of A 

2. For all y G Rng(A), x = Fy is a solution of the equation Ax = y. 

Proof. See Ref. 0. 

Theorem 2. For all g-inverse T of A all solutions o/Ax = y are of the form 

x=ry+(J-rA)z, (9) 

with arbitrary z. 

Proof. See Ref. 0- 

We now define a norm in % as follows 

N 

:=5^|ci|Vii, (10) 



llcP 



i=l 



where ttj^ = SijiTa is a positive matrix which is diagonal in the canonical basis 
in %. In terms of vr we define the minimum norm g-inverses F that satisfy 



ttFA = A^FV. (11) 

Notice that the present definition of minimum norm g-inverse requires that 
the norm is induced by a scalar product (in our case a ■ b := ^j^^ a*Tiiibi). 
We will now prove the following crucial theorem 

Theorem 3. The following assertions are equivalent 

1. T is a minimum norm g-inverse of A. 

2. For all y G Rng(A), x = Fy is a solution of the equation Ax = y with 
minimum norm. 

Proof. We first prove that [T] ^ [21 For F g-inverse of A, one has due to 
Theorem [2] 

||Fy+(/-FA)z||2 
= [ytFt + zt(/ - AtFt)]7r[Fy + (J - FA)z] 

=l|ry|i: + ||(/-rA)z||^ ^''^ 

+ z^(/ - ATF^)7rFy + yTF%(J - FA)z. 



Since by hypothesis y G Rng(A), then y = Au for some u in %. For a 
minimum norm g-inverse F as in the hypothesis, due to Eq. fill I) one has 



z^I - A^T^)nTAu + uU^FV(/ - FA)z = 
z\l - A^T^)A^Thu + uWTA{I - FA)z = 0. 



(13) 



where the last equahty is due to Eq. ([8]). Finally, this proves that 

||Fy + (/ - FA)z||^ = ||Fy||^ + ||(/ - FA)z||^ > ||Fy||^ (14) 

namely the solution x = Fy is minimum-norm. 

Now we prove [2] ^ [H If x = Fy is a solution of Ax = y for all y G Rng(A), 
by Theorem[T]F is a g-inverse of A, namely AFA = A. Then if Fy is minimum 
norm solution of |Ax = y| then due to Theorem [2] 

||Fy||^<||Fy + (/-FA)z||^ (15) 

for all y G Rng(A) and for all z one has 

< 1(1 - FA)z||2 + z^(/ - A^Ft)7rFy + y^FV(/ - FA)z. (16) 

Since an arbitrary y G Rng(A) is Au for arbitrary u, the second term in 
Eq. (TT6|) becomes 

zt(/ - A^F'f)7rFAu + uU^FV(/ - FA)z = 

23fJ(z^(/-A^F"f)7rFAu). ^ ^ 

Let us keep z fixed and multiply u by an arbitrary a. If the expression in 
Eq. (fT71) is not vanishing then taking \a\ sufficiently large, for suitable phase 
one can contradict the bound in Eq. (fT6|) . hence 3fi(z"''(/ — A''"F"'')7rFAu) = 
for all u and z and by the same reasoning Q=(z''^(J — A^F^)7rFAu) = for all 
u and z. We can then conclude that (/ - AtFt)7rFA = A^T^n{I - FA) = 0, 
and consequently vrFA = A"'^F"'^7r ■ 

Using Eq. (fTTj) . and considering that Sj(X) is the norm of the vector 
of coefficients f[X] with tth = Tr[PjPe], it has been proved in |6| that the 
minimum noise is achieved by F corresponding to the set of operators D^ 
given by 

N 

DT ■■= A. - J]{[(/ - A/)7r(/ - M)]hMh,A, (18) 

3=1 



where Aj is the set of operators corresponding to the Moore-Penrose g-inverse 
r^p of A, satisfying the properties 

r A = A'l'r''" r Ar = r r^ a^ = Ar (i9) 

^ mp^^ ji J- ,yjp5 J- mp^^'- nip '- mpi ^ mp mpi \^'-' J 

and M := F^pA = M^ = M^. The symbol X^ denotes the Moore-Penrose 
g-inverse of X. It is indeed easy to verify that 

Topt := Pmp - [(/ - M)7r(/ - M)]^'KMTmp (20) 

satisfies Eq. fITT]) . Notice that being Popt minimum norm independently of 
X, the statistical error is minimized by the same choice D°^ for all operators 
X. 

When a X-outcomes POVM on a d-dimensional Hilbert space CK ~ C^ is 
info-complete the state p can be written as 

N 

p = Y,Dip{i\p), (21) 

where Di corresponds to any g-inverse P. It is then possible to reconstruct 
any state p using the statistics from measurements: 

N N 

p = Y,Pi^\p)Di = Y.^iDr, (22) 



where Vi = ^^ is the experimental frequency of the i-th outcome, rij being 
the number of occurrence of the i-th outcome, and ritot = Tlii'^^i- By the law 
of large numbers we have that lim z/j = p{i\p). However, the convergence 

rate of p to p depends on the choice of D^. It turns out |9[ that the choice -D°^*, 
corresponding to Popt, is the one with the fastest convergence (in average over 
all possible experimental outcomes) in the Hilbert-Schmidt distance, defined 
as follows 

||p-p2||^=Tr[(p-p)2]. (23) 

This can be easily proved considering that the Hilbert-Schmidt distance can 
be written as the sum of the variances 5{\m){n\Y, and all of the summands 
are minimized by the choice of minimum-norm P = P^pt . 



3. The Bayesian iterative procedure 

In this Section we describe the iterative estimation procedure based on 
the update of the prior information by means of the state reconstruction 
provided by experimental data. Here we provide an algorithmic description 
of the procedure, that yields a self-consistent solution: 

1. The protocol starts with the choice of a priori ensemble £ := {piiPi\ 
(where pi are states and pi are their prior probabilities), with the cor- 
responding density matrix p^") '■= Pe ~ "l^iPiP'h 6. g. the one of the 
uniform ensemble of all pure states p^^^ = I/d. 

2. Using p^^^ it is possible to calculate the diagonal matrix with the prob- 
ability of the different outcomes: 



TT, 



U •" 



5,,Tr[P,p("^] (24) 



3. Using TTij in Eq. (llSp we can find the optimal g-inverse Fopt corresponding 
to D°^ associated with p^^\ 

4. Now the initial a priori density matrix p^^^ = p^ will be updated as 
follows: 

N 

p('^ = J2u.,Dr (25) 

4 = 1 

5. If p^^^ = p^^^ within a given tolerable error e then the average input state 
is p := p^^^ and the procedure stops. 

6. Otherwise after setting p^'^^ := p^^^ the procedure will go back to the step 
2. 

It is important to remark that at each step the matrices p'-^-* and D"^ 
are automatically self-adjoint and normalized: Tr[p*^^^] = 1 since for all i: 
Tr[D°^ ] = 1 (6|, however, they are not necessarily positive. 

This protocol in principle provides reliable state reconstructions, however, 
its iterative character makes it less efficient than the one introduced in next 
Section, since at any iterative step one has to calculate the Moore-Penrose g- 
inverse in Eq. (TT8|) . which is typically a time-consuming operation, especially 
for POVM's with a large number A^ of outcomes. 



4. The frequent ist approach 

In this Section we introduce the second processing strategy, based on the 
substitution of prior probabihties by experimental frequencies in Eq. fITT]) . 
While the previous protocol is essentially a Bayesian update, in this case the 
the processing relies on the law of large numbers, namely on the fact that 
lim^^^^^oo T-'i = p(^|p)) where the limit has to be understood in probability. We 
name this approach frequentist because it fits the frequentist interpretation 
of probabilities as approximations of experimental frequencies, avoiding prior 
probabilities, which are the signature of the Bayesian approach. 

If we substitute the metric matrix vr in the Eq. (ITU]) with the diagonal 
matrix of the frequencies z/j, we get: 

z/EA = A^EV (26) 

and following the same proof as for Eq. flTK]l we obtain the following expression 
of the optimal g-inverse E^ satisfying condition Eq. (l26l) . in terms of the 
corresponding operators D^ 

N 

Df^ := A, - J2m - MHI - M)]^uMy,A, (27) 

j=i 

that is non linear in the outcomes frequencies due to the Moore-Penrose 
g-inverse of (J - M)iy{I - M). 

This protocol has the advantage that it requires only one evaluation of 
Moore-Penrose g-inverse, and it is then much faster — in terms of computa- 
tional resources — than the iterative one introduced in the previous Section. 
However, here generally TrfZ^^ ] ^ 1, whence in addition to positivity of the 
estimated state p, also the normalization constraint is lost (but not hermitic- 

ity). 

5. Numerical simulations 

In order to test these two methods and to compare their performances 
with the plain un-updated procedure some Monte Carlo simulation have ben 
performed. As an example, we considered the info-complete POVM com- 
posed by the following six elements 

P±, = ^(/±a,), (28) 

o 



cq = I and a - 
cal state is 

P 



cr.. 



O'y^O'z 



denoting the usual Pauli matrices. The theoreti- 



7 



:0-', 



:Cr„ 



:0-; 



(29) 



The simulation consists in 1000 experiments, each consisting in 1000 
single-shot measurements, simulated by POVM events extraction according 
to the theoretical probabilities p{±i\p) := Tr[P-tjp]. The number of iterations 
in the Bayesian processing is 10. 
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Figure 1: Histograms representing the number of experiments versus the Hilbert-Schmidt 
distance of the resulting state from the theoretical one. Upper plot: the light gray bars 
correspond to the Bayesian processing, the dark grey correspond to the plain processing 
without updating, the white part is the overlap. Lower plot: the dark grey bars correspond 
to the frequentist processing method. Both plots show a well visible shift of the histograms 
corresponding to the new adaptive methods towards small errors compared to the plain 
processing without update. [For other data concerning plots see text.] 

In Fig. [T]we show the histograms representing the number of experiments 
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Procedure {H.S.dist.) a A{{H.S.dist.)) A(ct) 

Plain (no update) OM (TOS 

Bayesian 0.05 0.02 -17% -33.3% 

Frequentist 0.05 0.02 -17% -33.3% 

Table 1: Average Hilbert-Schmidt distance, variance a of the histogram, and relative 
improvements compared to the plain un-updated procedure of the new data-processing 
strategies presented in the paper. [For other data concerning this table see text.] 



as a function of the Hilbert-Schmidt distance of the resulting state p from 
the theoretical one p. The plots show a well evident shift of the histograms 
for both new processing methods towards small errors compared to the plain 
processing without updating. In Table [1] we summarize these considerations 
by showing the average Hilbert-Schmidt distance obtained with the three 
kinds of processing, along with the corresponding variance and the relative 
improvement of the figure of merit. 

6. Conclusions 

In conclusion, we have presented two novel data-processing strategies 
to improve convergence of estimation of ensemble average via info-complete 
measurements. The two approaches adaptively update the data-processing 
functions in a Bayesian and frequentist fashion, respectively, by substituting 
the prior probabilities with experimental frequencies (frequentist) and the 
prior state with the updated state (Bayesian). The two methods have been 
tested by numerical simulations, and both showed improved convergence rate 
compared to the original plain un-updated strategy. Clearly, further improve- 
ment is possible using both procedure together, however, this would be an 
higher-order correction. 
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